Bayesian Estimation and Inference of Large-Scale Duplications
نویسنده
چکیده
Background Recent analysis of eukaryotic genomes shows that genome duplication has occurred multiple times in fish, plants and yeast. This thesis presents a Bayesian procedure for estimating and inferring the timing of largescale duplications. The estimation procedure calculates the expected number of duplications over each edge in a discretized species tree, to find sites where many duplications have occurred. A large-scale duplication hypothesis can then be constructed and tested in an extended GSR model, which is a probabilistic model for gene and sequence evolution. The methods are then verified on two yeast data sets that each are thought to contain a genome duplication. Results The estimation procedure performs well on the first data set and clearly indicate the sites of large-scale duplications. On the second data set the estimate is located in the ancestor of the species previously predicted to contain a genome duplication. It is discussed whether this is caused by an inherent assumption of the estimation procedure, but it cannot be resolved without further studies. Several other improvements on the estimation procedure are discussed as well. Conclusions The conclusion is that the method performs well despite relatively small data sets, and the assumption of the estimation procedure can be relaxed by normalizing the expected number of duplications appropriately. In other cases the estimation procedure can be used to generate qualitative hypotheses of large-scale duplication events. Although this thesis suggest a method for testing large-scale duplication hypotheses in the GSR model, it remains to evaluate it experimentally.
منابع مشابه
Classical and Bayesian Inference in Two Parameter Exponential Distribution with Randomly Censored Data
Abstract. This paper deals with the classical and Bayesian estimation for two parameter exponential distribution having scale and location parameters with randomly censored data. The censoring time is also assumed to follow a two parameter exponential distribution with different scale but same location parameter. The main stress is on the location parameter in this paper. This parameter has not...
متن کاملInference of Markov Chain: AReview on Model Comparison, Bayesian Estimation and Rate of Entropy
This article has no abstract.
متن کاملBayesian Inference for Spatial Beta Generalized Linear Mixed Models
In some applications, the response variable assumes values in the unit interval. The standard linear regression model is not appropriate for modelling this type of data because the normality assumption is not met. Alternatively, the beta regression model has been introduced to analyze such observations. A beta distribution represents a flexible density family on (0, 1) interval that covers symm...
متن کاملInference on Pr(X > Y ) Based on Record Values From the Power Hazard Rate Distribution
In this article, we consider the problem of estimating the stress-strength reliability $Pr (X > Y)$ based on upper record values when $X$ and $Y$ are two independent but not identically distributed random variables from the power hazard rate distribution with common scale parameter $k$. When the parameter $k$ is known, the maximum likelihood estimator (MLE), the approximate Bayes estimator and ...
متن کاملBayesian Estimation of Parameters in the Exponentiated Gumbel Distribution
Abstract: The Exponentiated Gumbel (EG) distribution has been proposed to capture some aspects of the data that the Gumbel distribution fails to specify. In this paper, we estimate the EG's parameters in the Bayesian framework. We consider a 2-level hierarchical structure for prior distribution. As the posterior distributions do not admit a closed form, we do an approximated inference by using ...
متن کامل